Draft
Conversation
Introduce RequestTimer for per-request phase tracking (init, backend, process, total) exposed via Server-Timing response headers. Add benchmark tooling with --profile mode for collecting timing data. Document phased optimization plan covering streaming architecture, code-level fixes, and open design questions for team review.
Introduce RequestTimer for per-request phase tracking (init, backend, process, total) exposed via Server-Timing response headers. Add benchmark tooling with --profile mode for collecting timing data. Document phased optimization plan covering streaming architecture, code-level fixes, and open design questions for team review.
RequestTimer and Server-Timing header were premature — WASM guest profiling via profile.sh gives better per-function visibility without runtime overhead. Also strips dead --profile mode from benchmark.sh.
build.rs already resolves trusted-server.toml + env vars at compile time and embeds the result. Replace Settings::from_toml() with direct toml::from_str() to skip the config crate pipeline on every request. Profiling confirms: ~5-8% → ~3.3% CPU per request.
- OPTIMIZATION.md: profiling results, CPU breakdown, phased optimization plan covering streaming fixes, config crate elimination, and stream_to_client() architecture - scripts/profile.sh: WASM guest profiling via --profile-guest with Firefox Profiler-compatible output - scripts/benchmark.sh: TTFB analysis, cold start detection, endpoint latency breakdown, and load testing with save/compare support
…ding HTML and RSC Flight URL rewriting, to avoid full-body buffering
Collaborator
Author
Performance Benchmark: HTML Streaming OptimizationWe ran a comprehensive apples-to-apples benchmark to measure the impact of the To ensure statistical accuracy:
🚀 The Results: ProductionThis is the true impact on live users hitting the Fastly Edge:
📉 The Results: Staging(Note: Total times are higher here because Staging serves a 190KB uncompressed JS bundle, whereas Prod serves a minified 28KB bundle).
🎯 ConclusionThe
Exchanging 30ms of trailing transfer time for 15-20ms of upfront TTFB savings is a highly favorable trade for perceived performance. This branch is safe and recommended for merge. |
* Optimize Next.js RSC streaming with lazy accumulation Implement lazy buffering that delays accumulation until RSC content is detected, improving streaming from 0% to 28-37% for RSC pages while maintaining 100% URL rewriting correctness. - Add needs_accumulation() trait for conditional buffering - Add 10MB memory limit for DoS protection - Create integration test suite with real Next.js fixtures - Add example Next.js app for testing Performance: RSC pages stream 28-37% (theoretical max), non-RSC 96%. * Preserve publisher fallback headers, centralize route classification, and always clean up live test temp files
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR combines the core publisher-proxy streaming optimization with the
Next.js RSC follow-up work.
At the platform level, Trusted Server moves from a fully buffered proxy model
to chunked streaming using Fastly
stream_to_client(), enabling early headerflush and incremental HTML delivery to reduce TTFB and improve subresource
discovery.
On top of that foundation, the HTML pipeline now supports RSC-aware lazy
accumulation: non-RSC content continues to stream immediately, while only RSC
content that requires post-processing is buffered and rewritten safely. This
preserves correctness for fragmented/cross-script RSC payloads while restoring
meaningful streaming behavior.
Key Changes
stream_to_client()Integration (publisher.rs)Replaced fully buffered response collection with
stream_to_client()toenable immediate header dispatch and incremental chunk streaming.
lol_htmlOutput Pipeline (streaming_processor.rs)Refactored the
HtmlRewriteradapter to implement theOutputSinktraitwith a shared
Rc<RefCell<Vec<u8>>>, enabling true incremental streaming.Buffer Pre-allocation
Replaced
std::mem::takewithVec::with_capacityandstd::mem::replaceto reduce reallocation churn during chunk processing.WASM Hostcall Batching
Wrapped
StreamingBodyoutput in an 8KBstd::io::BufWriterto reduceWASM-to-host boundary crossings.
RSC Lazy Accumulation (
html_processor.rs)Added conditional accumulation mode that starts buffering only when
post-processing is required (for example, RSC placeholders or fragmented
scripts). Non-RSC pages continue streaming instead of being fully buffered.
RSC Post-processing Triggers (
nextjsintegration)Added
needs_accumulationsupport to integration post-processors andneeds_post_processingdetection in placeholder state, including fragmentedscript tracking for fallback re-parse correctness.
Memory Safety Guardrail
Added a 10MB cap for accumulated post-processed HTML to avoid unbounded
memory growth on large/malicious documents.
Routing and Header Consistency (
fastly/src/main.rs,publisher.rs)Centralized route classification and standardized response-header application
across buffered and streaming paths.
RSC Fixture/Test Expansion
Added fixture-driven Next.js integration tests (including real Next.js output)
plus a dedicated example app and scripts for fixture capture and live streaming
validation.
Code Health
Resolved associated Clippy warnings and added missing
# Errorsdocumentation in streaming-related handlers.
Test Plan
Local Unit & Workspace Tests
Run:
cargo test --workspaceTypeScript Bundle Build
Run:
in
crates/js/libto verify successful generation of integration modules.Next.js RSC Integration Tests
Run:
cargo test --test nextjs_integration -- --nocaptureto validate URL rewriting correctness and streaming behavior across fixture
sets/chunk sizes.
Local Fastly Simulation
Run:
Verify:
curl)Staging Load Testing
Execute:
against staging to quantify external TTFB and Time-to-Last-Byte (TTLB)
improvements under concurrent traffic.
Closes
closes #320